Statistical Tests for Comparing Supervised Classi cation Learning Algorithms
نویسنده
چکیده
This paper reviews ve statistical tests for determining whether one learning algorithm out-performs another on a particular learning task. These tests are compared experimentally to determine their probability of incorrectly detecting a diierence when no diierence exists (type 1 error). Two widely-used statistical tests are shown to have high probability of Type I error in certain situations and should never be used. These tests are (a) a test for the diierence of two proportions and (b) a paired-diierences t test based on taking several random train/test splits. A third test, a paired-diierences t test based on 10-fold cross-validation, exhibits somewhat elevated probability of Type I error. A fourth test, McNemar's test, is shown to have low Type I error. The fth test is a new test, 5x2cv, based on 5 iterations of 2-fold cross-validation. Experiments show that this test also has good Type I error. The paper also measures the power (ability to detect algorithm diierences when they do exist) of these tests. The 5x2cv test is shown to be slightly more powerful than McNemar's test. The choice of the best test is determined by the computational cost of running the learning algorithm. For algorithms that can be executed only once, McNemar's test is the only test with acceptable Type I error. For algorithms that can be executed ten times, the 5x2cv test is recommended, because it is slightly more powerful and because it directly measures variation due to the choice of training set.
منابع مشابه
Combination of neural and statistical algorithms for supervised classification of remote-sensing image
Various experimental comparisons of algorithms for supervised classi®cation of remote-sensing images have been reported in the literature. Among others, a comparison of neural and statistical classi®ers has previously been made by the authors in (Serpico, S.B., Bruzzone, L., Roli, F., 1996. Pattern Recognition Letters 17, 1331±1341). Results of reported experiments have clearly shown that the s...
متن کاملApproximate Statistical Tests for Comparing Supervised Classi cation Learning Algorithms
This paper reviews ve approximate statistical tests for determining whether one learning algorithm out-performs another on a particular learning task. These tests are compared experimentally to determine their probability of incorrectly detecting a diierence when no diierence exists (type I error). Two widely-used statistical tests are shown to have high probability of Type I error in certain s...
متن کاملCombining Labeled and Unlabeled Data for MultiClass Text Categorization
Supervised learning techniques for text classi cation often require a large number of labeled examples to learn accurately. One way to reduce the amount of labeled data required is to develop algorithms that can learn e ectively from a small number of labeled examples augmented with a large number of unlabeled examples. Current text learning techniques for combining labeled and unlabeled, such ...
متن کاملMachine Learning Research: Four Current Directions
Machine Learning research has been making great progress in many directions This article summarizes four of these directions and discusses some current open problems The four directions are a improving classi cation accuracy by learning ensembles of classi ers b methods for scaling up supervised learning algorithms c reinforcement learning and d learning complex stochastic models
متن کاملComparison and Combination of Statistical and Neural Network Algorithms for Remote-sensing Image Classification
In recent years, the remote-sensing community has became very interested in applying neural networks to image classi cation and in comparing neural networks performances with the ones of classical statistical methods. These experimental comparisons pointed out that no single classi cation algorithm can be regarded as a \panacea". The superiority of one algorithm over the other strongly depends ...
متن کامل